University of San Francisco
Thus far, we’ve talked about “linear” regression.
Today, we will talk about “logistic” regression.
You’ll be able to perform, interpret, and communicate:
DV can be either 1 or 0rm(list = ls())
# List of packages
pkgs <- c('dplyr', 'ggplot2')
# Check for packages that are not installed
new.pkgs <- pkgs[!(pkgs %in% installed.packages()[,"Package"])]
# Install the ones we need
if(length(new.pkgs)) install.packages(new.pkgs)
#Load them all in
lapply(pkgs, library, character.only = TRUE)
# Remove lists
rm(pkgs, new.pkgs)
age.data <- data.frame(
age = c(27,30,32,33,35,40,44,45,50,58,59,60),
buyer = c(0,0,1,0,0,1,0,1,1,1,1,1))lm(data = age.data,
formula = buyer ~ age) %>%
summary() %>% #Summarize the model
coef() %>% # Take just coefficients
round(4) # Round to 4 decimal places Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.6571 0.4530 -1.4504 0.1776
age 0.0290 0.0102 2.8327 0.0178
glm(data = age.data,
formula = buyer ~ age,
family = 'binomial') %>%
summary() %>% #Summarize the model
coef() %>% # Take just coefficients
round(4) # Round to 4 decimal places Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.5879 4.2767 -1.7742 0.0760
age 0.1969 0.1099 1.7913 0.0732
WTF WAS THAT???
What you need to know about coefficients:
predict function to understand specific predictionsHow do we interpret each result?
buyer has to be 0 or 1! So set it as such: age buyer.linear
1 25 0
2 28 0
3 30 0
4 35 0
5 40 1
6 48 1
7 52 1
8 60 1
9 65 1
10 65 1
buyer has to be 0 or 1! So set it as such: age buyer.linear buyer.logistic
1 25 0 0
2 28 0 0
3 30 0 0
4 35 0 0
5 40 1 1
6 48 1 1
7 52 1 1
8 60 1 1
9 65 1 1
10 65 1 1
set.seed(101)
age.data.3 <- data.frame(
age = runif(n = 1000,
min = 20,
max = 65))
linear.reg <- lm(data = age.data, formula = buyer ~ age)
logistic.reg <- glm(data = age.data, formula = buyer ~ age, family = 'binomial')
age.data.3$buyer.linear <- predict(
linear.reg,
newdata = age.data.3,
type = 'response'
)
age.data.3$buyer.linear <- ifelse(
age.data.3$buyer.linear >= .5, 1, 0
)
age.data.3$buyer.logistic <- predict(
logistic.reg,
newdata = age.data.3,
type = 'response'
)
age.data.3$buyer.logistic <- ifelse(
age.data.3$buyer.logistic >= .5, 1, 0
)age.data.3$discrepency <- ifelse(
age.data.3$buyer.linear == age.data.3$buyer.logistic, 0, 1
)
round( mean( age.data.3$discrepency), 3)[1] 0.031
In the middle?
In the middle?
When the flip is very strong:
When the flip is very strong:
DV can be either 1 or 0predict()